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Abstract 

Background: Segmenting electron microscopy (EM) images of cellular and subcellular processes in the nervous 
system is a key step in many bioimaging pipelines involving classification and labeling of ultrastructures. However, 
fully automated techniques to segment images are often susceptible to noise and heterogeneity in EM images 
(e.g. different histological preparations, different organisms, different brain regions, etc.). Supervised techniques to 
address this problem are often helpful but require large sets of training data, which are often difficult to obtain in 
practice, especially across many conditions. 

Results: We propose a new, principled unsupervised algorithm to segment EM images using a two-step approach: 
edge detection via salient watersheds following by robust region merging. We performed experiments to gather EM 
neuroimages of two organisms (mouse and fruit fly) using different histological preparations and generated manually 
curated ground-truth segmentations. We compared our algorithm against several state-of-the-art unsupervised 
segmentation algorithms and found superior performance using two standard measures of under-and 
over-segmentation error. 

Conclusions: Our algorithm is general and may be applicable to other large-scale segmentation problems for 
bioimages. 

Keywords: Image segmentation, Superpixels, Salient watershed, Region merging, Electron microscopy, 
Unsupervised learning 



Background 

Electron microscopy (EM) images can reveal the phys- 
ical structure of cellular and subcellular processes in 
the nervous system at a fine level of resolution. Accu- 
rately segmenting such images is a key component of 
many bioimage related tasks — including labeling, visu- 
alization, and classification — in structural biology and 
neuroscience [1]. 

However, fully automated methods to segment EM 
images are computationally challenging to develop due to 
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both natural and synthetic noise in the images and irregu- 
larity in cellular structures. Noise can emerge due to vari- 
ations in histological preparations or in the image acqui- 
sition process, or due to natural differences in the brain 
tissue or organisms of interest. This noise is extremely 
difficult to overcome experimentally and thus must be 
accounted for computationally. The physical shape of 
many structures (e.g. neural membranes) can also vary 
widely and do not conform to a standard template for 
detection [2], and intensity and contrast differences may 
also be equally inconsistent across samples. High-quality 
EM images can also be very large (millions to tens of mil- 
lions of pixels), which further constrains the complexity 
of image processing algorithms. While it may be possible 
to fine-tune an algorithm to handle nuances within a spe- 
cific EM preparation, few algorithms have been proposed 
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that can reasonably handle images across a variety of dif- 
ferent imaging conditions and preparations. Supervised 
or semi-supervised techniques are often helpful [3-9], 
but they require large sets of training data, which are 
often difficult to obtain in practice, especially across many 
conditions. 

An important initial step of image segmentation is 
grouping pixels into coherent local regions called super- 
pixels. Running algorithms on the decomposed set of 
superpixels (instead of the original pixels) can aid exist- 
ing supervised or semi-supervised approaches for EM 
segmentation as well as other downstream computer 
vision tasks by simplifying learning and inference. Indeed, 
in recent years, many unsupervised algorithms have 
been proposed to generate superpixels and range from 
graph-based [10-13], to gradient-ascent-basad [14-17], to 
clustering-based approaches [18] (see Achanta et al. [19] 
for review). These algorithms have mostly been tailored 
for processing natural images and are often sensitive to 
variations in image quality and noise that are inherent to 
the EM process. These algorithms also employ different 
constraints and parameters (e.g. different rules to enforce 
regularity of superpixel size and shape, different measures 
of superpixel homogeneity, etc.) designed according to 
their intended application. 

In this paper, we propose a novel, principled unsu- 
pervised segmentation algorithm designed specifically to 
be robust to the types of variation and noise expected 
in EM images of brain tissue. We propose a two-step 
approach: First, we develop a novel watershed variant 
that produces a coarse over-segmentation while strongly 
preserving edges in the image. This is done by using 
Canny [20] and probabilistic boundary [21] edges to 
find high-confidence boundaries, which are then incorpo- 
rated as constraints into the watershed algorithm. Second, 
we design a new region merging algorithm to reduce 
the number of superpixels by merging adjacent regions 
based on a measure of similarity derived from intensity 
and texture features. We formalize the merging prob- 
lem as a graph-theoretic optimization function and use 
an efficient agglomerative greedy algorithm to find a 
final partition into the desired number of superpixels. 
We performed experiments to gather EM images of the 
fruit fly and mouse nervous systems using two differ- 
ent histological preparations. Using two standard mea- 
sures of over- and under-segmentation error, we show 
that our approach offers a significant reduction in the 
number of superpixels while preserving more true bound- 
aries than existing state-of-the-art algorithms (Figure 1). 
We also show qualitative results on several additional 
images. Our results suggest that unsupervised tech- 
niques can be used as a general first-pass technique to 
reduce image complexity without significantly sacrificing 
accuracy. 



Methods 

The salient watershed algorithm 

Given an EM image to segment, the first step is to pro- 
duce accurate boundary-preserving superpixels. While 
many algorithms exist for this purpose, the classi- 
cal watershed algorithm [16] is a natural choice due 
to its ease of use, efficiency, and scalability. Unfortu- 
nately, the standard watershed algorithm suffers from 
two significant problems: over-segmentation and leak- 
age. Over-segmentation can usually be corrected with 
post-processing steps (such as region merging); however, 
to extract regions from EM images that correspond to 
precise cellular structures, fixing leakage in the initial 
segmentation is critical. While dataset-dependent heuris- 
tics may help resolve leakage, this does not address the 
general problem of watershed leakage when segment- 
ing images across different EM preparations and imaging 
conditions. 

To tackle these issues, we propose a novel variant of the 
watershed algorithm called Salient Watershed, The steps 
of our algorithm are: 

1. De-noise the image. We use non-local-means 
smoothing [22] to both reduce the impact of local 
noise when detecting boundaries and to reduce 
unnecessary over-segmentation. In particular, we 
pre-process the original input image I with a 3 x 3 
pixel-wide non-local-means filter [22] to obtain I n i 
(Figure 2B). 

2. Detect high-confidence boundaries. First, we 
apply the Canny edge detector [20] on I n i to obtain 
^jcanny ^ g econc ^ we CO mpute the Pb detector [21] on 

I n l for a coarse estimate of boundary probabilities 

and then we compute an edge map V/^ by 

thresholding/^ at a conservative threshold (1/200). 
Third, we combine these edges into a hybrid salient 
edge map via pixel-wise multiplication: 

yjsalient = yf^ # ^ fanny 2Q y R has been 

previously shown that the probabilistic Pb edge 
detector [21] by itself cannot adequately segment EM 
images without re-training on specific type of images 
[5]. Combining the Canny and Pb boundary 
detectors gives us the ability to find high-likelihood 
salient boundaries that retain precise edge 
localization without resorting to parameter tuning or 
re-training for different kinds of tissue samples. 

3. Elevate watershed levels where Canny and Pb 
coincide. Next, we compute the Euclidean distance 



transform on VI™ lient to obtain If*£ nt and then 



enhance _ e ~2*I l dist 



^salient 



compute an enhanced edge map / 
(Figure 2D). This step elevates the watershed along 
the intersection of Canny and Pb lines and provides 
an exponential fall-off as the distance to these lines 
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A Original Image B Manual Segmentation C Our Algorithm 




D Watershed E SLIC F TurboPixels G Our Algorithm 




Figure 1 Overview and example segmentations. A) Original 1000x1 000-pixel EM image of the fruit fly ventral nerve cord. B) Manual 
ground-truth segmentation. C) The result of our segmentation algorithm after /c-means clustering. D-G) Segmentations of the highlighted region 
in yellow returned by each algorithm using a total of roughly 1 000 superpixels each (boundaries shown in red). Our algorithm better adheres to the 
true edges compared to Watershed, SLIC, and TurboPixels. 



increases. It also helps bridge small gaps that may 
exist in the boundaries. 
4. Run watershed on the enhanced image. Finally, we 
apply the classical watershed algorithm on l enhance to 
obtain the final over-segmented image j^ ance 
(Figure 2E). 

By incorporating the notion of edge saliency into 
the watershed computation, we ensure that salient 
boundaries are preserved. This addresses the leakage 
problem consistently. While this procedure adds addi- 
tional computational complexity to the original water- 
shed procedure, Salient Watershed is a more robust 
algorithm that can be applied to many EM datasets 
to produce a first-pass segmentation without tuning 
parameters. 

This algorithm produces an initial (over-segmented) set 
of superpixels (regions), which are then further collapsed 
using an agglomerative merging algorithm, as described 
below. 

The region merging algorithm 

Region merging is often performed after superpixels are 
generated to collapse neighboring regions. There are three 
aspects to region merging: the features used to repre- 
sent each region, a measure of similarity between regions 
in feature-space, and an objective function for merging 
regions. We describe each of these aspects below. 



Each region is defined by a normalized intensity his- 
togram and a set of normalized texture histograms com- 
puted using pixel values in the region. Texture is an 
important cue used by humans when manually segment- 
ing and annotating EM images [23], and its use has 
become popular in many computer vision tasks today [24]. 
Varma and Zisserman [25] proposed an effective set of 
38 filters (6 orientations x 3 scales x 2 oriented filters + 
2 isotropic filters), but only recorded the maximum filter 
response across each orientation, leading to 8 total filter 
responses at each pixel. Each region is thus represented by 
a b x 9 feature matrix, where = 32 is the number of bins 
in each histogram. 

Most previous approaches compute the similarity 
between two regions in feature-space based on the 
Euclidean or Manhattan distances [26], by comparing 
means and standard deviations of feature vectors [27,28], 
or using information-theoretic measures [29]. The down- 
side of these measures is that they treat each histogram 
bin independently and, as a result, two histograms that 
differ slightly in adjacent bins are treated as equally dis- 
tant as two histograms that differ equally in far-apart 
bins. To avoid this problem, we use the Earth Movers 
Distance (EMD) [30], which computes the minimum cost 
to transform one histogram to exactly match the other 
using transformation costs that depend on the linear dis- 
tance between bins. EMD can be solved quickly using a 
constrained bipartite network flow routine [31]. Overall, 
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A Original image B Non-local means filter 




C Intersecting Canny and Pb edges D Enhancing the boundaries 




E Salient Watershed output 




Figure 2 The Salient Watershed algorithm. A) The original input image. B) Non-local means filter applied to de-noise the image. C) Detecting 
high-confidence boundary edges. D) Elevating watershed levels where Canny and Pb coincide. E) Final watershed output on the enhanced image. 
See 3 for full description of each step. 



the similarity between two adjacent regions r and / is 
defined as: 

s(r, r') = exp(- min(r size , r' size )) + exp(-EMD(Int r , Inty) 

8 

- a ^ EMD(Text r ,/, Text r /,0)> (1) 

i=l 



where the first term biases towards collapsing smaller 
regions; Int r is the normalized intensity histogram of 
region r; Text r> ; is the z th normalized texture histogram of 
region r; and a is a parameter to weigh the contribution 
of the texture component (we set a = 1/8). We use EMD 
to compute the similarity between both normalized fea- 
tures (intensity and texture), and thus born terms lie on 
roughly the same scale. 
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The final aspect of the algorithm is the merging opti- 
mization function [26]. We define a predicate V that 
states that every region r should be "sufficiently" different 
compared to each of its neighbors. Formally: 



true if s(r, r') < r, Vr r e N(r) 
false otherwise, 



Algorithm 1 Region-Merging ( I , L , NumSPs ) 
l: === Compute RAG and region features === 
2: G <— region_adj_graph (L) 
3: Reg2Intensity <- 

region_intensity_histogram (I , L) 
4: Reg2Texture <— region_text_histograms 
(I,L) 

5: 

6: === Populate heap with edge costs === 

7: for allu, v e E (G) do 

8: H <- push (s (u, v) ,u, v) 

9: end for 

10: 

li: === Iteratively merge most similar regions === 

12: while |G| < NumSPs do 

13: == Find the most similar pair of neighboring 
regions 

14: (S,U,V) <- H.popO 
15: 

16: == Remove the (u,v) boundary and any heap 

entries with u or v == 

17: G <— contract_edge (G, u, v) 

18: L <— remove_boundary ( L , u , v) 

19: H <- remove_all (H,u) 

20: H <- remove_all (H, v) 

21: 

22: == Compute features for new region, uv == 
23: Reg2Intensity <— 

update_intensity ( I , L, u, v, uv) 
24: Reg2Texture <- 

update_texture (I , L, u, v, uv) 

25: 

26: == Add heap entries for new region uv == 
27: for all x e G . neighbors (uv) do 
28: H <— push ( s (uv, x) , uv, x) 
29: end for 
30: end while 
31: return G 

Note: The function s(u,v) computes the similarity 
between regions u and v using Reg2 Intensity and 
Reg2 Texture (see Equation 1). I is the original image 
and L is the label matrix that defines region boundaries. 



where N(r) are the regions adjacent to r. If this state- 
ment is true for region r, we call r an "island". We seek to 
find a segmentation such that V holds for every region. 
In graph-theoretic terms, we start with the region adja- 
cency graph G = (V,E), defined by nodes V (regions) 
and with edges E connecting adjacent regions. To merge 
two regions means to contract the edge between them; 
our goal is thus to find a set of edges whose contrac- 
tion results in a graph satisfying V for every region. 
We find such a set using a greedy agglomerative algo- 
rithm: we start with the regions produced by the Salient 
Watershed algorithm, and iteratively merge the pair of 
neighboring regions that are most similar. This process 
can stop either when the similarity between any two 
adjacent regions is < r (at which point every region is 
guaranteed to be an island according to r) or when the 
desired number of superpixels is met (as we do here). 
Pseudocode of the region merging algorithm is shown in 
Algorithm 1. 

Comparing segmentations versus ground-truth 

To evaluate performance, we performed experiments and 
collected three 1000 x 1000-pixel EM images of the ner- 
vous system: 2 images were from the fruit fly ventral nerve 
cord fixed using a high pressure freezing (HPF) protocol, 
and 1 image was from the mouse cortex using a perfusion 
DAB-based protocol (e.g. [32]). We manually segmented 
membranes, mitochondria, and other neuronal structures 
in these images (Figure 1A and IB) and extracted ground- 
truth boundary matrices for each. We also collected two 
additional images of the mouse cortex using HPF, which 
we analyzed qualitatively. 

To compare an algorithms segmentation P with the 
ground- truth Q, we use two standard metrics: the asym- 
metric partition distance (APD) and the symmetric par- 
tition distance (SPD) [33]. APD(P, Q) computes, over all 
regions r e P, the maximum percentage of pixels in r 
that map onto a single ground-truth segment. SPD(P, Q) 
finds the maximal matching between regions in P and Q 
and computes the overall percentage of pixels that must 
be deleted from both images in order to make each pair 
of matched regions equivalent. APD penalizes "spill-over" 
of segments across ground-truth boundaries, but does 
not penalize over-segmentation. On the other hand, SPD 
measures exact 1-1 correspondence between segmenta- 
tions and does penalize over-segmentation. We report 
1-SPD(P, Q) as a percentage, so in both measures higher 
percentages are better. 

Results and discussion 

We compared our algorithm against TurboPixels [17] and 
a MATLAB implementation of SLIC [19,34]. TurboPixels 
uses geometric flows to find regions that are approxi- 
mately uniform in size and shape while also preserving 
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smooth boundary edges, and it is specifically designed 
to produce high-quality over-segmentations. SLIC is a 
clustering method based on /c-means that was shown to 
be superior to several graph-based and gradient-ascent- 
based algorithms on segmenting mitochondria in EM 
images [18]. It was also recently shown in a large-scale 
comparison to be amongst the best performing algorithms 
on the Berkeley segmentation dataset [19], and thus repre- 
sents the current state-of-the-art. We ran each algorithm 
on our EM images and varied the number of superpix- 
els returned by adjusting parameters in the algorithm. For 
each segmentation, we computed the over- and under- 
segmentation error (SPD and APD, respectively). 

Our algorithm more strictly adheres to true boundaries 
compared to the other algorithms across nearly the entire 
range of superpixels (Figure 1D-G and Figure 3A). For 
example, at roughly 2000 superpixels on the first fruit fly 
image, our algorithm has an APD of 93.72% compared to 
88.98% for TurboPixels and 86.89% for SLIC. Thus, we can 
achieve over three orders of magnitude reduction in the 
number of superpixels (compared to the original image) 
while still preserving over 90% of the true boundaries. 
Some predicted boundary contours may indeed be correct 
but do not align exactly with the ground-truth bound- 
aries; thus, this value actually represents a lower-bound 
on performance. In practice, over-segmentation is often 
more permissive than under-segmentation because it is 
relatively easy for downstream analyses to specify addi- 
tional merges (e.g. via classification) but more difficult and 
labor-intensive to reconstruct a lost boundary. 

Our algorithm also outperforms the other methods 
in extracting true regions in their entirety (Figure 3B). 
The SPD penalizes over-segmentation and measures exact 
concordance between the ground-truth and algorithm 
partitions. At 1000 superpixels, our algorithm has an aver- 
age SPD of almost 50% compared to 7% (TurboPixels) 



and 30% (SLIC). This means that half the pixels in our 
partition are exactly matched to ground- truth regions. 
Our ground-truth was constructed to consider entire 
membranes as single regions (as a biologist might), but 
there may be small substructures within membranes 
that persist due their markedly different features. These 
regions will naturally be left unmerged by each algorithm; 
a more fine-grained ground-truth segmentation would 
thus increase these percentages further. TurboPixels espe- 
cially suffers on the SPD measure because it generates 
regular- and grid-like superpixels (Figure IF); EM images, 
however, contain many irregularly-shaped structures that 
do not fit this mold. 

Our algorithm and SLIC perform similarly under both 
metrics when the number of desired superpixels is large 
(Figure 3 at 10,000 superpixels), but diverge as fewer 
superpixels are requested. This suggests that both meth- 
ods may be comparable at high numbers of superpix- 
els, but that our region merging algorithm is more 
robust at preserving boundaries than the clustering-based 
approach used by SLIC. 

We also compared our Salient Watershed algorithm to 
the classical watershed algorithm [16]. On the first image, 
for example, the latter produced a segmentation with 
43,252 regions and an APD of 94.17%. Salient Water- 
shed produced a segmentation with 13,252 regions and 
an APD of 95.25%. APD can not increase with subse- 
quent merges; the fact that our segmentation produces 
a higher APD with more than 3x fewer regions testi- 
fies to the strong edge-preserving property of our salient 
watersheds. 

Next, to determine whether our superpixels may 
be used for classification, we took the 1000 superpix- 
els generated by our algorithm and clustered them in 
feature-space using /c-means (Figure 1C). Co-clustered 
regions were assigned the same color (we used k = 13 
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Figure 3 Under- and over-segmentation error of each algorithm with respect to ground-truth. The average and standard deviation of 
A) APD and B) SPD for each algorithm on our EM benchmark dataset. Overall, our algorithm preserves more true ground-truth boundaries (APD) 
and better captures true ground-truth segments within a single region (SPD) compared to TurboPixels and SLIC. 
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but found similar results for many k). Visual inspection 
shows that indeed many similar structures — in particular 
mitochondria (light green) and membranes (purple) — 
are similarly colored. This implies that the superpixels 
that comprise these regions represent homogeneous 
biologically structures and that they are well-separated 
by intervening boundaries in feature space. This clus- 
tering represent a first-pass unsupervised labeling of 
EM images that can be further improved via supervised 
techniques [3,5]. 

Finally, we demonstrate the performance of our algo- 
rithm versus SLIC and TurboPixels qualitatively on two 
additional images of the mouse cortex prepared using 
high pressure freezing (Figure 4). The previous images 
of the mouse cortex were obtained using DAB. With- 
out altering any parameters, we ran each algorithm using 
2,000 superpixels and visually compared the predicted 
boundaries. As with the previous images, our method pre- 
serves intricate membrane boundaries much better than 



the other techniques and produces more homogeneous 
regions. We also find superior performance when cap- 
turing irregularly-shaped regions, and we are better able 
to separate regions that are separated by a thin bound- 
ary (e.g. two membrane boundaries that lie adjacent to 
one another; Figure 4). Both of these types of heterogene- 
ity are widespread in EM images and not easily captured 
by methods that make assumptions about edge proper- 
ties or the distribution of noise in EM images [28]. This 
further suggests that our unsupervised approach is robust 
to some natural variations caused by different histological 
preparations in EM neuroimages. 

Conclusions 

Accurately segmenting electron microscopy images is an 
important problem for many neuroimage related tasks, 
but it also presents several computational challenges 
due to the noise and variation inherent in tissue sam- 
ples and in the EM chemistry and image acquisition 




Figure 4 Qualitative results on two additional images. We ran SLIC, TurboPixels, and our algorithm on two additional images of the mouse 
cortex prepared using a high-pressure freezing EM protocol. Our approach again preserves boundaries and edges with more fidelity than the other 
methods, despite no adjustment of parameters. 
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processes. We presented an unsupervised algorithm to 
generate boundary-preserving superpixels by combin- 
ing a salient watershed algorithm with robust region 
merging. On a benchmark dataset of noisy EM images, 
our algorithm outperformed two state-of-the-art meth- 
ods using two standard measures of over- and under- 
segmentation error. While our method has additional 
computational complexity, we place emphasis on accu- 
racy and contend that downstream time spent in EM 
image analysis will be reduced through more accurate 
segmentations. 

While aspects of this general pipeline for segmentation 
(edge detection, watershed, region merging) have been 
used in previous works [8,9,28], the specific sequence of 
steps as outlined in this paper is novel. This combination 
of components offers our unsupervised approach a level 
of generality and robustness that can handle many types 
of noise present in heterogeneous EM data. Our approach 
also uses few parameters and may be usable across differ- 
ent EM histological preparations and for other large-scale 
bioimage segmentation problems (e.g. segmentation of 
cells, nuclei, or proteins within fluorescence microscopy 
images). 
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